Robotics 36
☆ Enhancing Underwater Navigation through Cross-Correlation-Aware Deep INS/DVL Fusion
The accurate navigation of autonomous underwater vehicles critically depends
on the precision of Doppler velocity log (DVL) velocity measurements. Recent
advancements in deep learning have demonstrated significant potential in
improving DVL outputs by leveraging spatiotemporal dependencies across multiple
sensor modalities. However, integrating these estimates into model-based
filters, such as the extended Kalman filter, introduces statistical
inconsistencies, most notably, cross-correlations between process and
measurement noise. This paper addresses this challenge by proposing a
cross-correlation-aware deep INS/DVL fusion framework. Building upon BeamsNet,
a convolutional neural network designed to estimate AUV velocity using DVL and
inertial data, we integrate its output into a navigation filter that explicitly
accounts for the cross-correlation induced between the noise sources. This
approach improves filter consistency and better reflects the underlying sensor
error structure. Evaluated on two real-world underwater trajectories, the
proposed method outperforms both least squares and cross-correlation-neglecting
approaches in terms of state uncertainty. Notably, improvements exceed 10% in
velocity and misalignment angle confidence metrics. Beyond demonstrating
empirical performance, this framework provides a theoretically principled
mechanism for embedding deep learning outputs within stochastic filters.
☆ Dataset and Analysis of Long-Term Skill Acquisition in Robot-Assisted Minimally Invasive Surgery
Objective: We aim to investigate long-term robotic surgical skill acquisition
among surgical residents and the effects of training intervals and fatigue on
performance. Methods: For six months, surgical residents participated in three
training sessions once a month, surrounding a single 26-hour hospital shift. In
each shift, they participated in training sessions scheduled before, during,
and after the shift. In each training session, they performed three dry-lab
training tasks: Ring Tower Transfer, Knot-Tying, and Suturing. We collected a
comprehensive dataset, including videos synchronized with kinematic data,
activity tracking, and scans of the suturing pads. Results: We collected a
dataset of 972 trials performed by 18 residents of different surgical
specializations. Participants demonstrated consistent performance improvement
across all tasks. In addition, we found variations in between-shift learning
and forgetting across metrics and tasks, and hints for possible effects of
fatigue. Conclusion: The findings from our first analysis shed light on the
long-term learning processes of robotic surgical skills with extended intervals
and varying levels of fatigue. Significance: This study lays the groundwork for
future research aimed at optimizing training protocols and enhancing AI
applications in surgery, ultimately contributing to improved patient outcomes.
The dataset will be made available upon acceptance of our journal submission.
comment: 12 pages, 8 figures
☆ Cooking Task Planning using LLM and Verified by Graph Network
Cooking tasks remain a challenging problem for robotics due to their
complexity. Videos of people cooking are a valuable source of information for
such task, but introduces a lot of variability in terms of how to translate
this data to a robotic environment. This research aims to streamline this
process, focusing on the task plan generation step, by using a Large Language
Model (LLM)-based Task and Motion Planning (TAMP) framework to autonomously
generate cooking task plans from videos with subtitles, and execute them.
Conventional LLM-based task planning methods are not well-suited for
interpreting the cooking video data due to uncertainty in the videos, and the
risk of hallucination in its output. To address both of these problems, we
explore using LLMs in combination with Functional Object-Oriented Networks
(FOON), to validate the plan and provide feedback in case of failure. This
combination can generate task sequences with manipulation motions that are
logically correct and executable by a robot. We compare the execution of the
generated plans for 5 cooking recipes from our approach against the plans
generated by a few-shot LLM-only approach for a dual-arm robot setup. It could
successfully execute 4 of the plans generated by our approach, whereas only 1
of the plans generated by solely using the LLM could be executed.
☆ Data-Driven Contact-Aware Control Method for Real-Time Deformable Tool Manipulation: A Case Study in the Environmental Swabbing
Deformable Object Manipulation (DOM) remains a critical challenge in robotics
due to the complexities of developing suitable model-based control strategies.
Deformable Tool Manipulation (DTM) further complicates this task by introducing
additional uncertainties between the robot and its environment. While humans
effortlessly manipulate deformable tools using touch and experience, robotic
systems struggle to maintain stability and precision. To address these
challenges, we present a novel State-Adaptive Koopman LQR (SA-KLQR) control
framework for real-time deformable tool manipulation, demonstrated through a
case study in environmental swab sampling for food safety. This method
leverages Koopman operator-based control to linearize nonlinear dynamics while
adapting to state-dependent variations in tool deformation and contact forces.
A tactile-based feedback system dynamically estimates and regulates the swab
tool's angle, contact pressure, and surface coverage, ensuring compliance with
food safety standards. Additionally, a sensor-embedded contact pad monitors
force distribution to mitigate tool pivoting and deformation, improving
stability during dynamic interactions. Experimental results validate the
SA-KLQR approach, demonstrating accurate contact angle estimation, robust
trajectory tracking, and reliable force regulation. The proposed framework
enhances precision, adaptability, and real-time control in deformable tool
manipulation, bridging the gap between data-driven learning and optimal control
in robotic interaction tasks.
comment: Submitted for Journal Review
☆ STAMICS: Splat, Track And Map with Integrated Consistency and Semantics for Dense RGB-D SLAM
Simultaneous Localization and Mapping (SLAM) is a critical task in robotics,
enabling systems to autonomously navigate and understand complex environments.
Current SLAM approaches predominantly rely on geometric cues for mapping and
localization, but they often fail to ensure semantic consistency, particularly
in dynamic or densely populated scenes. To address this limitation, we
introduce STAMICS, a novel method that integrates semantic information with 3D
Gaussian representations to enhance both localization and mapping accuracy.
STAMICS consists of three key components: a 3D Gaussian-based scene
representation for high-fidelity reconstruction, a graph-based clustering
technique that enforces temporal semantic consistency, and an open-vocabulary
system that allows for the classification of unseen objects. Extensive
experiments show that STAMICS significantly improves camera pose estimation and
map quality, outperforming state-of-the-art methods while reducing
reconstruction errors. Code will be public available.
☆ Neuro-Symbolic Imitation Learning: Discovering Symbolic Abstractions for Skill Learning ICRA
Imitation learning is a popular method for teaching robots new behaviors.
However, most existing methods focus on teaching short, isolated skills rather
than long, multi-step tasks. To bridge this gap, imitation learning algorithms
must not only learn individual skills but also an abstract understanding of how
to sequence these skills to perform extended tasks effectively. This paper
addresses this challenge by proposing a neuro-symbolic imitation learning
framework. Using task demonstrations, the system first learns a symbolic
representation that abstracts the low-level state-action space. The learned
representation decomposes a task into easier subtasks and allows the system to
leverage symbolic planning to generate abstract plans. Subsequently, the system
utilizes this task decomposition to learn a set of neural skills capable of
refining abstract plans into actionable robot commands. Experimental results in
three simulated robotic environments demonstrate that, compared to baselines,
our neuro-symbolic approach increases data efficiency, improves generalization
capabilities, and facilitates interpretability.
comment: IEEE International Conference on Robotics and Automation (ICRA) 2025
☆ AcL: Action Learner for Fault-Tolerant Quadruped Locomotion Control
Tianyu Xu, Yaoyu Cheng, Pinxi Shen, Lin Zhao, Electrical, Computer Engineering, National University of Singapore, Singapore, Mechanical Engineering, National University of Singapore, Singapore
Quadrupedal robots can learn versatile locomotion skills but remain
vulnerable when one or more joints lose power. In contrast, dogs and cats can
adopt limping gaits when injured, demonstrating their remarkable ability to
adapt to physical conditions. Inspired by such adaptability, this paper
presents Action Learner (AcL), a novel teacher-student reinforcement learning
framework that enables quadrupeds to autonomously adapt their gait for stable
walking under multiple joint faults. Unlike conventional teacher-student
approaches that enforce strict imitation, AcL leverages teacher policies to
generate style rewards, guiding the student policy without requiring precise
replication. We train multiple teacher policies, each corresponding to a
different fault condition, and subsequently distill them into a single student
policy with an encoder-decoder architecture. While prior works primarily
address single-joint faults, AcL enables quadrupeds to walk with up to four
faulty joints across one or two legs, autonomously switching between different
limping gaits when faults occur. We validate AcL on a real Go2 quadruped robot
under single- and double-joint faults, demonstrating fault-tolerant, stable
walking, smooth gait transitions between normal and lamb gaits, and robustness
against external disturbances.
☆ A Data-Driven Method for INS/DVL Alignment
Autonomous underwater vehicles (AUVs) are sophisticated robotic platforms
crucial for a wide range of applications. The accuracy of AUV navigation
systems is critical to their success. Inertial sensors and Doppler velocity
logs (DVL) fusion is a promising solution for long-range underwater navigation.
However, the effectiveness of this fusion depends heavily on an accurate
alignment between the inertial sensors and the DVL. While current alignment
methods show promise, there remains significant room for improvement in terms
of accuracy, convergence time, and alignment trajectory efficiency. In this
research we propose an end-to-end deep learning framework for the alignment
process. By leveraging deep-learning capabilities, such as noise reduction and
capture of nonlinearities in the data, we show using simulative data, that our
proposed approach enhances both alignment accuracy and reduces convergence time
beyond current model-based methods.
☆ UGNA-VPR: A Novel Training Paradigm for Visual Place Recognition Based on Uncertainty-Guided NeRF Augmentation
Visual place recognition (VPR) is crucial for robots to identify previously
visited locations, playing an important role in autonomous navigation in both
indoor and outdoor environments. However, most existing VPR datasets are
limited to single-viewpoint scenarios, leading to reduced recognition accuracy,
particularly in multi-directional driving or feature-sparse scenes. Moreover,
obtaining additional data to mitigate these limitations is often expensive.
This paper introduces a novel training paradigm to improve the performance of
existing VPR networks by enhancing multi-view diversity within current datasets
through uncertainty estimation and NeRF-based data augmentation. Specifically,
we initially train NeRF using the existing VPR dataset. Then, our devised
self-supervised uncertainty estimation network identifies places with high
uncertainty. The poses of these uncertain places are input into NeRF to
generate new synthetic observations for further training of VPR networks.
Additionally, we propose an improved storage method for efficient organization
of augmented and original training data. We conducted extensive experiments on
three datasets and tested three different VPR backbone networks. The results
demonstrate that our proposed training paradigm significantly improves VPR
performance by fully utilizing existing data, outperforming other training
approaches. We further validated the effectiveness of our approach on
self-recorded indoor and outdoor datasets, consistently demonstrating superior
results. Our dataset and code have been released at
\href{https://github.com/nubot-nudt/UGNA-VPR}{https://github.com/nubot-nudt/UGNA-VPR}.
comment: Accepted to IEEE Robotics and Automation Letters (RA-L)
☆ Lidar-only Odometry based on Multiple Scan-to-Scan Alignments over a Moving Window
Lidar-only odometry considers the pose estimation of a mobile robot based on
the accumulation of motion increments extracted from consecutive lidar scans.
Many existing approaches to the problem use a scan-to-map registration, which
neglects the accumulation of errors within the maintained map due to drift.
Other methods use a refinement step that jointly optimizes the local map on a
feature basis. We propose a solution that avoids this by using multiple
independent scan-to-scan Iterative Closest Points (ICP) registrations to
previous scans in order to derive constraints for a pose graph. The
optimization of the pose graph then not only yields an accurate estimate for
the latest pose, but also enables the refinement of previous scans in the
optimization window. By avoiding the need to recompute the scan-to-scan
alignments, the computational load is minimized. Extensive evaluation on the
public KITTI and MulRan datasets as well as on a custom automotive lidar
dataset is carried out. Results show that the proposed approach achieves
state-of-the-art estimation accuracy, while alleviating the mentioned issues.
☆ An analysis of higher-order kinematics formalisms for an innovative surgical parallel robot
The paper presents a novel modular hybrid parallel robot for pancreatic
surgery and its higher-order kinematics derived based on various formalisms.
The classical vector, homogeneous transformation matrices and dual quaternion
approaches are studied for the kinematic functions using both classical
differentiation and multidual algebra. The algorithms for inverse kinematics
for all three studied formalisms are presented for both differentiation and
multidual algebra approaches. Furthermore, these algorithms are compared based
on numerical stability, execution times and number and type of mathematical
functions and operators contained in each algorithm. A statistical analysis
shows that there is significant improvement in execution time for the
algorithms implemented using multidual algebra, while the numerical stability
is appropriate for all algorithms derived based on differentiation and
multidual algebra. While the implementation of the kinematic algorithms using
multidual algebra shows positive results when benchmarked on a standard PC,
further work is required to evaluate the multidual algorithms on
hardware/software used for the modular parallel robot command and control.
☆ Haptic bilateral teleoperation system for free-hand dental procedures
Free-hand dental procedures are typically repetitive, time-consuming and
require high precision and manual dexterity. Dental robots can play a key role
in improving procedural accuracy and safety, enhancing patient comfort, and
reducing operator workload. However, robotic solutions for free-hand procedures
remain limited or completely lacking, and their acceptance is still low. To
address this gap, we develop a haptic bilateral teleoperation system (HBTS) for
free-hand dental procedures. The system includes a dedicated mechanical
end-effector, compatible with standard clinical tools, and equipped with an
endoscopic camera for improved visibility of the intervention site. By ensuring
motion and force correspondence between the operator's actions and the robot's
movements, monitored through visual feedback, we enhance the operator's sensory
awareness and motor accuracy. Furthermore, recognizing the need to ensure
procedural safety, we limit interaction forces by scaling the motion references
provided to the admittance controller based solely on measured contact forces.
This ensures effective force limitation in all contact states without requiring
prior knowledge of the environment. The proposed HBTS is validated in a dental
scaling procedure using a dental phantom. The results show that the system
improves the naturalness, safety, and accuracy of teleoperation, highlighting
its potential to enhance free-hand dental procedures.
comment: 12 pages, 12 figures
☆ Output-Feedback Boundary Control of Thermally and Flow-Induced Vibrations in Slender Timoshenko Beams
This work is motivated by the engineering challenge of suppressing vibrations
in turbine blades of aero engines, which often operate under extreme thermal
conditions and high-Mach aerodynamic environments that give rise to complex
vibration phenomena, commonly referred to as thermally-induced and flow-induced
vibrations. Using Hamilton's variational principle, the system is modeled as a
rotating slender Timoshenko beam under thermal and aerodynamic loads, described
by a mixed hyperbolic-parabolic PDE system where instabilities occur both
within the PDE domain and at the uncontrolled boundary, and the two types of
PDEs are cascaded in the domain. For such a system, we present the
state-feedback control design based on the PDE backstepping method. Recognizing
that the distributed temperature gradients and structural vibrations in the
Timoshenko beam are typically unmeasurable in practice, we design a state
observer for the mixed hyperbolic-parabolic PDE system. Based on this observer,
an output-feedback controller is then built to regulate the overall system
using only available boundary measurements. In the closed-loop system, the
state of the uncontrolled boundary, i.e., the furthest state from the control
input, is proved to be exponentially convergent to zero, and all signals are
proved as uniformly ultimately bounded. The proposed control design is
validated on an aero-engine flexible blade under extreme thermal and
aerodynamic conditions.
☆ OminiAdapt: Learning Cross-Task Invariance for Robust and Environment-Aware Robotic Manipulation
With the rapid development of embodied intelligence, leveraging large-scale
human data for high-level imitation learning on humanoid robots has become a
focal point of interest in both academia and industry. However, applying
humanoid robots to precision operation domains remains challenging due to the
complexities they face in perception and control processes, the long-standing
physical differences in morphology and actuation mechanisms between humanoid
robots and humans, and the lack of task-relevant features obtained from
egocentric vision. To address the issue of covariate shift in imitation
learning, this paper proposes an imitation learning algorithm tailored for
humanoid robots. By focusing on the primary task objectives, filtering out
background information, and incorporating channel feature fusion with spatial
attention mechanisms, the proposed algorithm suppresses environmental
disturbances and utilizes a dynamic weight update strategy to significantly
improve the success rate of humanoid robots in accomplishing target tasks.
Experimental results demonstrate that the proposed method exhibits robustness
and scalability across various typical task scenarios, providing new ideas and
approaches for autonomous learning and control in humanoid robots. The project
will be open-sourced on GitHub.
☆ Dimensional optimization of single-DOF planar rigid link-flapping mechanisms for high lift and low power
Rigid link flapping mechanisms remain the most practical choice for flapping
wing micro-aerial vehicles (MAVs) to carry useful payloads and onboard
batteries for free flight due to their long-term durability and reliability.
However, to achieve high agility and maneuverability-like insects-MAVs with
these mechanisms require significant weight reduction. One approach involves
using single-DOF planar rigid linkages, which are rarely optimized
dimensionally for high lift and low power so that smaller motors and batteries
could be used. We integrated a mechanism simulator based on a quasistatic
nonlinear finite element method with an unsteady vortex lattice method-based
aerodynamic analysis tool within an optimization routine. We optimized three
different mechanism topologies from the literature. As a result, significant
power savings were observed up to 42% in some cases, due to increased amplitude
and higher lift coefficients resulting from optimized asymmetric sweeping
velocity profiles. We also conducted an uncertainty analysis that revealed the
need for high manufacturing tolerances to ensure reliable mechanism
performance. The presented unified computational tool also facilitates the
optimal selection of MAV components based on the payload and flight time
requirements.
☆ TAGA: A Tangent-Based Reactive Approach for Socially Compliant Robot Navigation Around Human Groups IROS
Robot navigation in densely populated environments presents significant
challenges, particularly regarding the interplay between individual and group
dynamics. Current navigation models predominantly address interactions with
individual pedestrians while failing to account for human groups that naturally
form in real-world settings. Conversely, the limited models implementing
group-aware navigation typically prioritize group dynamics at the expense of
individual interactions, both of which are essential for socially appropriate
navigation. This research extends an existing simulation framework to
incorporate both individual pedestrians and human groups. We present Tangent
Action for Group Avoidance (TAGA), a modular reactive mechanism that can be
integrated with existing navigation frameworks to enhance their group-awareness
capabilities. TAGA dynamically modifies robot trajectories using tangent
action-based avoidance strategies while preserving the underlying model's
capacity to navigate around individuals. Additionally, we introduce Group
Collision Rate (GCR), a novel metric to quantitatively assess how effectively
robots maintain group integrity during navigation. Through comprehensive
simulation-based benchmarking, we demonstrate that integrating TAGA with
state-of-the-art navigation models (ORCA, Social Force, DS-RNN, and AG-RL)
reduces group intrusions by 45.7-78.6% while maintaining comparable success
rates and navigation efficiency. Future work will focus on real-world
implementation and validation of this approach.
comment: 6 pages, 3 figures. Submitted as a conference paper in IEEE/RSJ
International Conference on Intelligent Robots and Systems (IROS), 2025
☆ Safe Human Robot Navigation in Warehouse Scenario
The integration of autonomous mobile robots (AMRs) in industrial
environments, particularly warehouses, has revolutionized logistics and
operational efficiency. However, ensuring the safety of human workers in
dynamic, shared spaces remains a critical challenge. This work proposes a novel
methodology that leverages control barrier functions (CBFs) to enhance safety
in warehouse navigation. By integrating learning-based CBFs with the Open
Robotics Middleware Framework (OpenRMF), the system achieves adaptive and
safety-enhanced controls in multi-robot, multi-agent scenarios. Experiments
conducted using various robot platforms demonstrate the efficacy of the
proposed approach in avoiding static and dynamic obstacles, including human
pedestrians. Our experiments evaluate different scenarios in which the number
of robots, robot platforms, speed, and number of obstacles are varied, from
which we achieve promising performance.
☆ Fuzzy-Logic-based model predictive control: A paradigm integrating optimal and common-sense decision making
This paper introduces a novel concept, fuzzy-logic-based model predictive
control (FLMPC), along with a multi-robot control approach for exploring
unknown environments and locating targets. Traditional model predictive control
(MPC) methods rely on Bayesian theory to represent environmental knowledge and
optimize a stochastic cost function, often leading to high computational costs
and lack of effectiveness in locating all the targets. Our approach instead
leverages FLMPC and extends it to a bi-level parent-child architecture for
enhanced coordination and extended decision making horizon. Extracting
high-level information from probability distributions and local observations,
FLMPC simplifies the optimization problem and significantly extends its
operational horizon compared to other MPC methods. We conducted extensive
simulations in unknown 2-dimensional environments with randomly placed
obstacles and humans. We compared the performance and computation time of FLMPC
against MPC with a stochastic cost function, then evaluated the impact of
integrating the high-level parent FLMPC layer. The results indicate that our
approaches significantly improve both performance and computation time,
enhancing coordination of robots and reducing the impact of uncertainty in
large-scale search and rescue environments.
comment: 50 Pages, 8 figures, 3 tables
♻ ☆ Immersive and Wearable Thermal Rendering for Augmented Reality
In augmented reality (AR), where digital content is overlaid onto the real
world, realistic thermal feedback has been shown to enhance immersion. Yet
current thermal feedback devices, heavily influenced by the needs of virtual
reality, often hinder physical interactions and are ineffective for immersion
in AR. To bridge this gap, we have identified three design considerations
relevant for AR thermal feedback: indirect feedback to maintain dexterity,
thermal passthrough to preserve real-world temperature perception, and
spatiotemporal rendering for dynamic sensations. We then created a unique and
innovative thermal feedback device that satisfies these criteria. Human subject
experiments assessing perceptual sensitivity, object temperature matching,
spatial pattern recognition, and moving thermal stimuli demonstrated the impact
of our design, enabling realistic temperature discrimination, virtual object
perception, and enhanced immersion. These findings demonstrate that carefully
designed thermal feedback systems can bridge the sensory gap between physical
and virtual interactions, enhancing AR realism and usability.
♻ ☆ Model-Predictive Trajectory Generation for Aerial Search and Coverage
This paper introduces a trajectory planning algorithm for search and coverage
missions with an Unmanned Aerial Vehicle (UAV) based on an uncertainty map that
represents prior knowledge of the target region, modeled by a Gaussian Mixture
Model (GMM). The trajectory planning problem is formulated as an Optimal
Control Problem (OCP), which aims to maximize the uncertainty reduction within
a specified mission duration. However, this results in an intractable OCP whose
objective functional cannot be expressed in closed form. To address this, we
propose a Model Predictive Control (MPC) algorithm based on a relaxed
formulation of the objective function to approximate the optimal solutions.
This relaxation promotes efficient map exploration by penalizing overlaps in
the UAV's visibility regions along the trajectory. The algorithm can produce
efficient and smooth trajectories, and it can be efficiently implemented using
standard Nonlinear Programming solvers, being suitable for real-time planning.
Unlike traditional methods, which often rely on discretizing the mission space
and using complex mixed-integer formulations, our approach is computationally
efficient and easier to implement. The MPC algorithm is initially assessed in
MATLAB, followed by Gazebo simulations and actual experimental tests conducted
in an outdoor environment. The results demonstrate that the proposed strategy
can generate efficient and smooth trajectories for search and coverage
missions.
♻ ☆ Towards Optimizing a Convex Cover of Collision-Free Space for Trajectory Generation
We propose an online iterative algorithm to optimize a convex cover to
under-approximate the free space for autonomous navigation to delineate Safe
Flight Corridors (SFC). The convex cover consists of a set of polytopes such
that the union of the polytopes represents obstacle-free space, allowing us to
find trajectories for robots that lie within the convex cover. In order to find
the SFC that facilitates trajectory optimization, we iteratively find
overlapping polytopes of maximum volumes that include specified waypoints
initialized by a geometric or kinematic planner. Constraints at waypoints
appear in two alternating stages of a joint optimization problem, which is
solved by a novel heuristic-based iterative algorithm with partially
distributed variables. We validate the effectiveness of our proposed algorithm
using a range of parameterized environments and show its applications for
two-stage motion planning.
♻ ☆ How NeRFs and 3D Gaussian Splatting are Reshaping SLAM: a Survey
Fabio Tosi, Youmin Zhang, Ziren Gong, Erik Sandström, Stefano Mattoccia, Martin R. Oswald, Matteo Poggi
Over the past two decades, research in the field of Simultaneous Localization
and Mapping (SLAM) has undergone a significant evolution, highlighting its
critical role in enabling autonomous exploration of unknown environments. This
evolution ranges from hand-crafted methods, through the era of deep learning,
to more recent developments focused on Neural Radiance Fields (NeRFs) and 3D
Gaussian Splatting (3DGS) representations. Recognizing the growing body of
research and the absence of a comprehensive survey on the topic, this paper
aims to provide the first comprehensive overview of SLAM progress through the
lens of the latest advancements in radiance fields. It sheds light on the
background, evolutionary path, inherent strengths and limitations, and serves
as a fundamental reference to highlight the dynamic progress and specific
challenges.
comment: Updated to November 2024
♻ ☆ Efficient Continual Adaptation of Pretrained Robotic Policy with Online Meta-Learned Adapters
Continual adaptation is essential for general autonomous agents. For example,
a household robot pretrained with a repertoire of skills must still adapt to
unseen tasks specific to each household. Motivated by this, building upon
parameter-efficient fine-tuning in language models, prior works have explored
lightweight adapters to adapt pretrained policies, which can preserve learned
features from the pretraining phase and demonstrate good adaptation
performances. However, these approaches treat task learning separately,
limiting knowledge transfer between tasks. In this paper, we propose Online
Meta-Learned adapters (OMLA). Instead of applying adapters directly, OMLA can
facilitate knowledge transfer from previously learned tasks to current learning
tasks through a novel meta-learning objective. Extensive experiments in both
simulated and real-world environments demonstrate that OMLA can lead to better
adaptation performances compared to the baseline methods. The project link:
https://ricky-zhu.github.io/OMLA/.
comment: Project link: https://ricky-zhu.github.io/OMLA/
♻ ☆ Integrating Naturalistic Insights in Objective Multi-Vehicle Safety Framework
As autonomous vehicle technology advances, the precise assessment of safety
in complex traffic scenarios becomes crucial, especially in mixed-vehicle
environments where human perception of safety must be taken into account. This
paper presents a framework designed for assessing traffic safety in
multi-vehicle situations, facilitating the simultaneous utilization of diverse
objective safety metrics. Additionally, it allows the integration of subjective
perception of safety by adjusting model parameters. The framework was applied
to evaluate various model configurations in car-following scenarios on a
highway, utilizing naturalistic driving datasets. The evaluation of the model
showed an outstanding performance, particularly when integrating multiple
objective safety measures. Furthermore, the performance was significantly
enhanced when considering all surrounding vehicles.
♻ ☆ Online POMDP Planning with Anytime Deterministic Guarantees
Decision-making under uncertainty is a critical aspect of many practical
autonomous systems due to incomplete information. Partially Observable Markov
Decision Processes (POMDPs) offer a mathematically principled framework for
formulating decision-making problems under such conditions. However, finding an
optimal solution for a POMDP is generally intractable. In recent years, there
has been a significant progress of scaling approximate solvers from small to
moderately sized problems, using online tree search solvers. Often, such
approximate solvers are limited to probabilistic or asymptotic guarantees
towards the optimal solution. In this paper, we derive a deterministic
relationship for discrete POMDPs between an approximated and the optimal
solution. We show that at any time, we can derive bounds that relate between
the existing solution and the optimal one. We show that our derivations provide
an avenue for a new set of algorithms and can be attached to existing
algorithms that have a certain structure to provide them with deterministic
guarantees with marginal computational overhead. In return, not only do we
certify the solution quality, but we demonstrate that making a decision based
on the deterministic guarantee may result in superior performance compared to
the original algorithm without the deterministic certification.
♻ ☆ Risk-Aware Reinforcement Learning for Autonomous Driving: Improving Safety When Driving through Intersection
Applying reinforcement learning to autonomous driving has garnered widespread
attention. However, classical reinforcement learning methods optimize policies
by maximizing expected rewards but lack sufficient safety considerations, often
putting agents in hazardous situations. This paper proposes a risk-aware
reinforcement learning approach for autonomous driving to improve the safety
performance when crossing the intersection. Safe critics are constructed to
evaluate driving risk and work in conjunction with the reward critic to update
the actor. Based on this, a Lagrangian relaxation method and cyclic gradient
iteration are combined to project actions into a feasible safe region.
Furthermore, a Multi-hop and Multi-layer perception (MLP) mixed Attention
Mechanism (MMAM) is incorporated into the actor-critic network, enabling the
policy to adapt to dynamic traffic and overcome permutation sensitivity
challenges. This allows the policy to focus more effectively on surrounding
potential risks while enhancing the identification of passing opportunities.
Simulation tests are conducted on different tasks at unsignalized
intersections. The results show that the proposed approach effectively reduces
collision rates and improves crossing efficiency in comparison to baseline
algorithms. Additionally, our ablation experiments demonstrate the benefits of
incorporating risk-awareness and MMAM into RL.
comment: 11 pages, 10 figures
♻ ☆ Constrained Nonlinear Kaczmarz Projection on Intersections of Manifolds for Coordinated Multi-Robot Mobile Manipulation ICRA
Cooperative manipulation tasks impose various structure-, task-, and
robot-specific constraints on mobile manipulators. However, current methods
struggle to model and solve these myriad constraints simultaneously. We propose
a twofold solution: first, we model constraints as a family of manifolds
amenable to simultaneous solving. Second, we introduce the constrained
nonlinear Kaczmarz (cNKZ) projection technique to produce constraint-satisfying
solutions. Experiments show that cNKZ dramatically outperforms baseline
approaches, which cannot find solutions at all. We integrate cNKZ with a
sampling-based motion planning algorithm to generate complex, coordinated
motions for 3 to 6 mobile manipulators (18--36 DoF), with cNKZ solving up to 80
nonlinear constraints simultaneously and achieving up to a 92% success rate in
cluttered environments. We also demonstrate our approach on hardware using
three Turtlebot3 Waffle Pi robots with OpenMANIPULATOR-X arms.
comment: Accepted for publication at IEEE International Conference on Robotics
and Automation (ICRA) 2025
♻ ☆ MUSE: A Real-Time Multi-Sensor State Estimator for Quadruped Robots
This paper introduces an innovative state estimator, MUSE (MUlti-sensor State
Estimator), designed to enhance state estimation's accuracy and real-time
performance in quadruped robot navigation. The proposed state estimator builds
upon our previous work presented in [1]. It integrates data from a range of
onboard sensors, including IMUs, encoders, cameras, and LiDARs, to deliver a
comprehensive and reliable estimation of the robot's pose and motion, even in
slippery scenarios. We tested MUSE on a Unitree Aliengo robot, successfully
closing the locomotion control loop in difficult scenarios, including slippery
and uneven terrain. Benchmarking against Pronto [2] and VILENS [3] showed 67.6%
and 26.7% reductions in translational errors, respectively. Additionally, MUSE
outperformed DLIO [4], a LiDAR-inertial odometry system in rotational errors
and frequency, while the proprioceptive version of MUSE (P-MUSE) outperformed
TSIF [5], with a 45.9% reduction in absolute trajectory error (ATE).
comment: Accepted for publication in IEEE Robotics and Automation Letters
♻ ☆ Mirroring the Parking Target: An Optimal-Control-Based Parking Motion Planner with Strengthened Parking Reliability and Faster Parking Completion
Automated Parking Assist (APA) systems are now facing great challenges of low
adoption in applications, due to users' concerns about parking capability,
reliability, and completion efficiency. To upgrade the conventional APA
planners and enhance user's acceptance, this research proposes an
optimal-control-based parking motion planner. Its highlight lies in its control
logic: planning trajectories by mirroring the parking target. This method
enables: i) parking capability in narrow spaces; ii) better parking reliability
by expanding Operation Design Domain (ODD); iii) faster completion of parking
process; iv) enhanced computational efficiency; v) universal to all types of
parking. A comprehensive evaluation is conducted. Results demonstrate the
proposed planner does enhance parking success rate by 40.6%, improve parking
completion efficiency by 18.0%, and expand ODD by 86.1%. It shows its
superiority in difficult parking cases, such as the parallel parking scenario
and narrow spaces. Moreover, the average computation time of the proposed
planner is 74 milliseconds. Results indicate that the proposed planner is ready
for real-time commercial applications.
comment: IEEE Transactions on Intelligent Transportation Systems (2024)
♻ ☆ Safety-Aware Human-Lead Vehicle Platooning by Proactively Reacting to Uncertain Human Behaving
Human-Lead Cooperative Adaptive Cruise Control (HL-CACC) is regarded as a
promising vehicle platooning technology in real-world implementation. By
utilizing a Human-driven Vehicle (HV) as the platoon leader, HL-CACC reduces
the cost and enhances the reliability of perception and decision-making.
However, state-of-the-art HL-CACC technology still has a great limitation on
driving safety due to the lack of considering the leading human driver's
uncertain behavior. In this study, a HL-CACC controller is designed based on
Stochastic Model Predictive Control (SMPC). It is enabled to predict the
driving intention of the leading Connected Human-Driven Vehicle (CHV). The
proposed controller has the following features: i) enhanced perceived safety in
oscillating traffic; ii) guaranteed safety against hard brakes; iii)
computational efficiency for real-time implementation. The proposed controller
is evaluated on a PreScan&Simulink simulation platform. Real vehicle trajectory
data is collected for the calibration of the simulation. Results reveal that
the proposed controller: i) improves perceived safety by 19.17% in oscillating
traffic; ii) enhances actual safety by 7.76% against hard brakes; iii) is
confirmed with string stability. The computation time is approximately 3.2
milliseconds when running on a laptop equipped with an Intel i5-13500H CPU.
This indicates the proposed controller is ready for real-time implementation.
♻ ☆ AlphaSpace: Enabling Robotic Actions through Semantic Tokenization and Symbolic Reasoning
This paper presents AlphaSpace, a novel methodology designed to enhance the
spatial reasoning capabilities of language models for robotic manipulation in
3D Cartesian space. AlphaSpace employs a hierarchical semantics-based
tokenization strategy that encodes spatial information at both coarse and
fine-grained levels. Our approach represents objects with their attributes,
positions, and height information through structured tokens, enabling precise
spatial reasoning without relying on traditional vision-based embeddings. This
approach enables LLMs to accurately manipulate objects by positioning them at
specific (x, y, z) coordinates. Experimental results suggest that AlphaSpace
demonstrates promising potential for improving manipulation tasks, achieving a
total accuracy of 66.67%, compared to 37.5% for GPT-4o and 29.17% for Claude
3.5 Sonnet. These results demonstrate the potential of structured spatial
encoding for manipulation tasks and warrant further exploration.
♻ ☆ LaMOuR: Leveraging Language Models for Out-of-Distribution Recovery in Reinforcement Learning
Deep Reinforcement Learning (DRL) has demonstrated strong performance in
robotic control but remains susceptible to out-of-distribution (OOD) states,
often resulting in unreliable actions and task failure. While previous methods
have focused on minimizing or preventing OOD occurrences, they largely neglect
recovery once an agent encounters such states. Although the latest research has
attempted to address this by guiding agents back to in-distribution states,
their reliance on uncertainty estimation hinders scalability in complex
environments. To overcome this limitation, we introduce Language Models for
Out-of-Distribution Recovery (LaMOuR), which enables recovery learning without
relying on uncertainty estimation. LaMOuR generates dense reward codes that
guide the agent back to a state where it can successfully perform its original
task, leveraging the capabilities of LVLMs in image description, logical
reasoning, and code generation. Experimental results show that LaMOuR
substantially enhances recovery efficiency across diverse locomotion tasks and
even generalizes effectively to complex environments, including humanoid
locomotion and mobile manipulation, where existing methods struggle. The code
and supplementary materials are available at https://lamour-rl.github.io/.
comment: This paper is currently under security review and will be re-released
once the review is complete
♻ ☆ DexForce: Extracting Force-informed Actions from Kinesthetic Demonstrations for Dexterous Manipulation
Imitation learning requires high-quality demonstrations consisting of
sequences of state-action pairs. For contact-rich dexterous manipulation tasks
that require dexterity, the actions in these state-action pairs must produce
the right forces. Current widely-used methods for collecting dexterous
manipulation demonstrations are difficult to use for demonstrating contact-rich
tasks due to unintuitive human-to-robot motion retargeting and the lack of
direct haptic feedback. Motivated by these concerns, we propose DexForce.
DexForce leverages contact forces, measured during kinesthetic demonstrations,
to compute force-informed actions for policy learning. We collect
demonstrations for six tasks and show that policies trained on our
force-informed actions achieve an average success rate of 76% across all tasks.
In contrast, policies trained directly on actions that do not account for
contact forces have near-zero success rates. We also conduct a study ablating
the inclusion of force data in policy observations. We find that while using
force data never hurts policy performance, it helps most for tasks that require
advanced levels of precision and coordination, like opening an AirPods case and
unscrewing a nut.
comment: Videos can be found here:
https://clairelc.github.io/dexforce.github.io/
♻ ★ GR00T N1: An Open Foundation Model for Generalist Humanoid Robots
NVIDIA, :, Johan Bjorck, Fernando Castañeda, Nikita Cherniadev, Xingye Da, Runyu Ding, Linxi "Jim" Fan, Yu Fang, Dieter Fox, Fengyuan Hu, Spencer Huang, Joel Jang, Zhenyu Jiang, Jan Kautz, Kaushil Kundalia, Lawrence Lao, Zhiqi Li, Zongyu Lin, Kevin Lin, Guilin Liu, Edith Llontop, Loic Magne, Ajay Mandlekar, Avnish Narayan, Soroush Nasiriany, Scott Reed, You Liang Tan, Guanzhi Wang, Zu Wang, Jing Wang, Qi Wang, Jiannan Xiang, Yuqi Xie, Yinzhen Xu, Zhenjia Xu, Seonghyeon Ye, Zhiding Yu, Ao Zhang, Hao Zhang, Yizhou Zhao, Ruijie Zheng, Yuke Zhu
General-purpose robots need a versatile body and an intelligent mind. Recent
advancements in humanoid robots have shown great promise as a hardware platform
for building generalist autonomy in the human world. A robot foundation model,
trained on massive and diverse data sources, is essential for enabling the
robots to reason about novel situations, robustly handle real-world
variability, and rapidly learn new tasks. To this end, we introduce GR00T N1,
an open foundation model for humanoid robots. GR00T N1 is a
Vision-Language-Action (VLA) model with a dual-system architecture. The
vision-language module (System 2) interprets the environment through vision and
language instructions. The subsequent diffusion transformer module (System 1)
generates fluid motor actions in real time. Both modules are tightly coupled
and jointly trained end-to-end. We train GR00T N1 with a heterogeneous mixture
of real-robot trajectories, human videos, and synthetically generated datasets.
We show that our generalist robot model GR00T N1 outperforms the
state-of-the-art imitation learning baselines on standard simulation benchmarks
across multiple robot embodiments. Furthermore, we deploy our model on the
Fourier GR-1 humanoid robot for language-conditioned bimanual manipulation
tasks, achieving strong performance with high data efficiency.
comment: Authors are listed alphabetically. Project leads are Linxi "Jim" Fan
and Yuke Zhu. For more information, see
https://developer.nvidia.com/isaac/gr00t
♻ ☆ AnyBimanual: Transferring Unimanual Policy for General Bimanual Manipulation
Performing general language-conditioned bimanual manipulation tasks is of
great importance for many applications ranging from household service to
industrial assembly. However, collecting bimanual manipulation data is
expensive due to the high-dimensional action space, which poses challenges for
conventional methods to handle general bimanual manipulation tasks. In
contrast, unimanual policy has recently demonstrated impressive
generalizability across a wide range of tasks because of scaled model
parameters and training data, which can provide sharable manipulation knowledge
for bimanual systems. To this end, we propose a plug-and-play method named
AnyBimanual, which transfers pre-trained unimanual policy to general bimanual
manipulation policy with few bimanual demonstrations. Specifically, we first
introduce a skill manager to dynamically schedule the skill representations
discovered from pre-trained unimanual policy for bimanual manipulation tasks,
which linearly combines skill primitives with task-oriented compensation to
represent the bimanual manipulation instruction. To mitigate the observation
discrepancy between unimanual and bimanual systems, we present a visual aligner
to generate soft masks for visual embedding of the workspace, which aims to
align visual input of unimanual policy model for each arm with those during
pretraining stage. AnyBimanual shows superiority on 12 simulated tasks from
RLBench2 with a sizable 12.67% improvement in success rate over previous
methods. Experiments on 9 real-world tasks further verify its practicality with
an average success rate of 84.62%.
comment: Project page: https://anybimanual.github.io/
♻ ☆ SyncDiff: Synchronized Motion Diffusion for Multi-Body Human-Object Interaction Synthesis
Synthesizing realistic human-object interaction motions is a critical problem
in VR/AR and human animation. Unlike the commonly studied scenarios involving a
single human or hand interacting with one object, we address a more generic
multi-body setting with arbitrary numbers of humans, hands, and objects. This
complexity introduces significant challenges in synchronizing motions due to
the high correlations and mutual influences among bodies. To address these
challenges, we introduce SyncDiff, a novel method for multi-body interaction
synthesis using a synchronized motion diffusion strategy. SyncDiff employs a
single diffusion model to capture the joint distribution of multi-body motions.
To enhance motion fidelity, we propose a frequency-domain motion decomposition
scheme. Additionally, we introduce a new set of alignment scores to emphasize
the synchronization of different body motions. SyncDiff jointly optimizes both
data sample likelihood and alignment likelihood through an explicit
synchronization strategy. Extensive experiments across four datasets with
various multi-body configurations demonstrate the superiority of SyncDiff over
existing state-of-the-art motion synthesis methods.
comment: 26 pages, 10 figures